Overview

This package is designed to geocode address strings that are private health information. Most institutional review boards will not allow the use of online geocoders. This program maintains HIPPA compliance by exact address matching using offline data from the Cincinnati Area Geographic Information System (CAGIS). This drastically improves geocoding accuracy, but is only available for addresses that have a zipcode beginning with 450, 451, or 452.

Installing

R Package

Install the package by running the following in R:

remotes::install_github('cole-brokamp/geocodeCAGIS')

Python

The package relies on a call to python to parse the address, which means that both python and the usaddress python library must be installed (pip install usaddress).

Database Files

The package requires an address database file as a R/sysdata.rda file, which is included with the package.

Alternatively, you can build the system data file on your own, using updated CAGIS files. Find the path to the folder which contains these scripts by running system.file(file.path('extdata'),package='geocodeCAGIS',mustWork=T) in R after loading the geocodeCAGIS package. By running bash build_CAGIS_address_file.sh, the script will download the CAGIS master address file, unzip it, and extract the shapefile to a CSV. Then make_CAGIS_database.R is called to parse all listed CAGIS addresses and save as a .RData file. Unused files are deleted afterwards.

Make sure to R CMD BATCH --vanilla make_sysdata.R and rebuild the package with the updated R/sysdata.rda if using your own address reference files.

Warning: These scripts were built for OS X or Linux. Both take a long time to run... please review the files beforehand. Making your own CAGIS database is not for the faint of heart.

Geocoding

Address String Formatting

Example usage

Single address with only coordinates:

library(geocodeCAGIS)
geocodeCAGIS('3333 Burnet Ave, Cincinnati, OH 45229')

Include all optional output:

geocodeCAGIS('3333 Burnet Ave, Cincinnati, OH 45229',
             return.score=TRUE,
             return.call=TRUE,
             return.match=TRUE)

Batch Geocoding

This package comes with an executable Rscript file that can be run to batch geocode a CSV file. Find the location of the file by running the following in R:

system.file(file.path('extdata/batch_geocodeCAGIS.R'),package='geocodeCAGIS')

Make sure to chmod the script to make it executable and then run it to look over the options.



cole-brokamp/geocodeCAGIS documentation built on May 13, 2019, 8:50 p.m.